Gene Onoltogy Analysis on iPSCs Ancestral

1. Environment Set Up

for (i in 1:length(params))
  print(paste('Parameter:', names(params)[i], ' - Value:', params[[i]], '- Class:', class(params[[i]])))
## [1] "Parameter: Dataset  - Value: CHD2_iPSCs_and_organoids_PublicRepo - Class: character"
## [1] "Parameter: SEFile  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/iPSCs/Output/Savings/ipsc.SE_deseq2_AR.rds - Class: character"
## [1] "Parameter: DEAList  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/iPSCs/Output/Savings/ipsc.DEAList_AR.rds - Class: character"
## [1] "Parameter: AR  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/iPSCs/Output/Savings/ipsc.deseqARvsWT.rds - Class: character"
## [1] "Parameter: SavingFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/iPSCs/Output/Savings/ - Class: character"
## [1] "Parameter: FiguresFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/iPSCs/Output/Figures/ - Class: character"
## [1] "Parameter: FDRthr  - Value: 0.05 - Class: numeric"
## [1] "Parameter: logFCthr  - Value: 0.55 - Class: numeric"
## [1] "Parameter: TopGO  - Value: BP_MF_CC - Class: character"
## [1] "Parameter: GoEnTh  - Value: 1 - Class: numeric"
## [1] "Parameter: GoPvalTh  - Value: 0.05 - Class: numeric"
## [1] "Parameter: NbName  - Value: TopGO_iPSCs_AR - Class: character"
## [1] "Parameter: SaveImages  - Value: FALSE - Class: logical"
  • Dataset: name of the dataset that is processed.
  • SE: input file containing differential expression results from DESeq2 (absolute path).
  • FDRthr: threshold on False Discovery Rate. Default 0.05.
  • logFCthr: threshold on False Discovery Rate. Default 1.5.
  • SavingFolder: directory where produced files will be written (absolute path). Default is getwd().
  • TopGO: string that specify the ontology domains to be analysed. Default BP and MF; also CC can be added.
library(RNASeqBulkExploratory)
library(SummarizedExperiment)
library(tidyr)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(topGO)
library(sechm)
library(ggplot2)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(cowplot)
source('/group/testa/Users/oliviero.leonardi/myProjects/CHD2/BulkRNAseq/ContainerHome/CHD2_organoids/NoGradientBarplots.R')
Dataset <- params$Dataset
logFCthr <- params$logFCthr
FDRthr <- params$FDRthr

FdrTh <- FDRthr
logFcTh <- logFCthr

SavingFolder <- ifelse(is.null(params$SavingFolder), getwd(), params$SavingFolder)
FiguresFolder <- ifelse(is.null(params$FiguresFolder), getwd(), params$FiguresFolder)


if (dir.exists(SavingFolder) == FALSE) {
  dir.create(SavingFolder, recursive=TRUE)
}

2. Data Upload

  • Summarized Experiment object containing expression data used for DEA and gene and sample metadata
  • DEA object, containing results of the differential expression

2.1 Load Data from DEA

#SE object coming from DEA, but not containing specific contrast results
SE_DEA <- readRDS(params$SEFile)

SE_DEA <- SE_DEA[rowData(SE_DEA)$GeneName != '', ]
rownames(SE_DEA) <- rowData(SE_DEA)$GeneName

# List with differential expression results (all time-points)
DEA <- readRDS(params$DEAList)
colvector <- c("#5ec962", "#e95462", "#2c728e")
names(colvector) <- c('All', 'Up',  'Down')

2.2 Add DEA results to SE

if(! identical(rownames(SE_DEA), row.names(DEA$AR$res))){
  stop('Expression data in SE and results from differential espression analysis are inconsistent.')
}

SE_DEA <- mergeDeaSE(SE_DEA, DEA$AR$res, subsetRowDataCols=NULL,
                     logFcCol='log2FoldChange', FdrCol='padj') #specify
## Renaming " log2FoldChange " to "logFC"
## Renaming "  padj  " to "FDR"

16616 genes in 6 samples have been testes for differential expression.

The following number of genes are identified as differentially expressed:

  • FDR < 0.1: 29 differentially expressed genes
  • FDR < 0.05: 25 differentially expressed genes
  • FDR < 0.05 and FC > 1.5: 25 differentially expressed genes
  • FDR < 0.05 and FC > 2: 23 differentially expressed genes

Imposing a threshold of 0.55 on the Log2FC and 0.05 on the FDR (as specified in parameters), 25 genes are selected: 3 up-regulated genes and 23 down-regulated genes.


3. RESULTS NAVIGATION: Interactive Table

An interactive table show the results for all DEGs (ranked according to FDR). A table of all DEGs can be downloaded from here.

From topTag I generate an interactive table for result interrogation with link to the Gene Cards. The table reports all the genes having a FDR < 0.05 and a Log2FC > 0.55 as absolute value, according to the threshold settings.

DEGsTable(SE_DEA, FdrTh=FdrTh, logFcTh=logFCthr, maxGenes=Inf, saveDEGs=FALSE)

4. RESULTS VISUALIZATION

4.1 Volcano plot

The results of the differential expression analysis are visualized by Volcano plot. An interactive version is included in the html (only genes with FDR < threshold), while a static version is saved.

plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = FALSE)
## Warning: Removed 16591 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).


plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = TRUE)

4.2 Heatmap for significant genes

Heatmaps for DEGs, showing scaled vst values.

DEGs <- dplyr::filter(data.frame(rowData(SE_DEA)), FDR < FDRthr & abs(logFC) > log2(logFCthr))
ScaledCols <- c('darkblue', "purple","white","lightgoldenrod1", 'goldenrod1')
# sechm::sechm(SE_DEA, genes=DEGs$GeneName, assayName="vst", gaps_at="Genotype", show_rownames=FALSE,
#       top_annotation=c('Genotype'), hmcols=ScaledCols, show_colnames=TRUE,
#       do.scale=TRUE, breaks=0.85, column_title = "Scaled Vst Values")

5. TOPGO for Gene Ontology Enrichment analysis

Gene ontology enrichment analysis is performed on the set of 25 genes using TopGO with Fisher statistics and weight01 algorithm.

For each specified domain of the ontology:

  • Enrichment analysis on all DEGs or splitted in down- and up-regulated

5.1 Selection of modulated genes and generation of gene vectors

I generate vectors for the gene universe, all modulated genes, up-regulated genes and down-regulated genes in the format required by TopGo.

GeneVectors <- topGOGeneVectors(SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr)
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1

Therefore:

  • universe genes: 16616 genes
  • modulated genes: 25 genes
  • down-regulated genes: 23 genes of interest
  • up-regulated genes: 2 genes of interest

Then I set parameters according to the gene ontology domains to be evaluated. By default, Biological Process and Molecular Function domains are interrogated.

BpEval <- ifelse(length(grep('BP', params$TopGO))!=0, TRUE, FALSE)
MfEval <- ifelse(length(grep('MF', params$TopGO))!=0, TRUE, FALSE)
CcEval <- ifelse(length(grep('CC', params$TopGO))!=0, TRUE, FALSE)

5.2 TopGO analysis: Biological Process

On the basis of the analysis settings, the enrichment for Biological Process IS performed.

Biological Process Analysis for ALL modulated genes: 25 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
BPannAR <- topGO::annFUN.org(whichOnto="BP", feasibleGenes=names(GeneVectors$DEGenes), 
                    mapping="org.Hs.eg.db", ID="symbol") %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResBPAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=BPannAR, ontology='BP', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='BPAllAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11568 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15221 GO terms and 34588 relations. )
## 
## Annotating nodes ...............
##  ( 13536 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 611 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 14:  4 nodes to be scored    (0 eliminated genes)
## 
##   Level 13:  7 nodes to be scored    (21 eliminated genes)
## 
##   Level 12:  14 nodes to be scored   (58 eliminated genes)
## 
##   Level 11:  18 nodes to be scored   (1172 eliminated genes)
## 
##   Level 10:  33 nodes to be scored   (3034 eliminated genes)
## 
##   Level 9:   54 nodes to be scored   (3865 eliminated genes)
## 
##   Level 8:   70 nodes to be scored   (5031 eliminated genes)
## 
##   Level 7:   85 nodes to be scored   (6310 eliminated genes)
## 
##   Level 6:   114 nodes to be scored  (8325 eliminated genes)
## 
##   Level 5:   99 nodes to be scored   (10031 eliminated genes)
## 
##   Level 4:   60 nodes to be scored   (11961 eliminated genes)
## 
##   Level 3:   36 nodes to be scored   (12905 eliminated genes)
## 
##   Level 2:   14 nodes to be scored   (13195 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13273 eliminated genes)

Biological Process Analysis for DOWN-REGULATED genes: 23 genes

# Wrapper function for topGO analysis (see helper file)
ResBPDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=BPannAR, ontology='BP', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='BPDownAR', outDir=paste0(SavingFolder)) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11568 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15221 GO terms and 34588 relations. )
## 
## Annotating nodes ...............
##  ( 13536 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 571 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 14:  4 nodes to be scored    (0 eliminated genes)
## 
##   Level 13:  7 nodes to be scored    (21 eliminated genes)
## 
##   Level 12:  13 nodes to be scored   (58 eliminated genes)
## 
##   Level 11:  17 nodes to be scored   (1172 eliminated genes)
## 
##   Level 10:  30 nodes to be scored   (2968 eliminated genes)
## 
##   Level 9:   48 nodes to be scored   (3769 eliminated genes)
## 
##   Level 8:   63 nodes to be scored   (4524 eliminated genes)
## 
##   Level 7:   80 nodes to be scored   (5810 eliminated genes)
## 
##   Level 6:   108 nodes to be scored  (7841 eliminated genes)
## 
##   Level 5:   93 nodes to be scored   (9758 eliminated genes)
## 
##   Level 4:   57 nodes to be scored   (11759 eliminated genes)
## 
##   Level 3:   34 nodes to be scored   (12605 eliminated genes)
## 
##   Level 2:   14 nodes to be scored   (13184 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13269 eliminated genes)

# Selection on enrichment of at least 2 is implemented (also to avoid depleted categories). Then categories are ranked by PVal and all the ones with Pval < th are selected. If the number is < minTerms, othter terms are included to reach the minimum number. 
GOTable(ResBPDownAR$ResSel, maxGO=20)

Biological Process Analysis for UP-REGULATED genes: 2 genes

ResBPUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=BPannAR, ontology='BP', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='BPUpAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11568 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15221 GO terms and 34588 relations. )
## 
## Annotating nodes ...............
##  ( 13536 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 94 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  3 nodes to be scored    (110 eliminated genes)
## 
##   Level 9:   6 nodes to be scored    (197 eliminated genes)
## 
##   Level 8:   10 nodes to be scored   (814 eliminated genes)
## 
##   Level 7:   10 nodes to be scored   (1580 eliminated genes)
## 
##   Level 6:   15 nodes to be scored   (2765 eliminated genes)
## 
##   Level 5:   19 nodes to be scored   (5325 eliminated genes)
## 
##   Level 4:   12 nodes to be scored   (7549 eliminated genes)
## 
##   Level 3:   10 nodes to be scored   (10810 eliminated genes)
## 
##   Level 2:   6 nodes to be scored    (11901 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (12410 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/BPUp'), recursive=TRUE)
#GOAnnotation(ResBPUp$ResSel, GOdata=ResBPUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/BPUp'), keytype='SYMBOL')
GOTable(ResBPUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResBPAllAR$ResSel, TopGOResDown=ResBPDownAR$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResBPAllAR$ResSel, TopGOResDown = ResBPDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.3 TopGO analysis: Molecular Function

On the basis of the analysis settings, the enrichment for Molecular Function IS performed.

Molecular Function Enrichment for ALL modulated genes: 25 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
MFannAR <- topGO::annFUN.org(whichOnto='MF', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResMFAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=MFannAR, ontology='MF', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='MFAllAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4142 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4603 GO terms and 5944 relations. )
## 
## Annotating nodes ...............
##  ( 13956 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 105 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 10:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 9:   3 nodes to be scored    (0 eliminated genes)
## 
##   Level 8:   9 nodes to be scored    (941 eliminated genes)
## 
##   Level 7:   14 nodes to be scored   (1149 eliminated genes)
## 
##   Level 6:   15 nodes to be scored   (1309 eliminated genes)
## 
##   Level 5:   18 nodes to be scored   (1608 eliminated genes)
## 
##   Level 4:   25 nodes to be scored   (3294 eliminated genes)
## 
##   Level 3:   13 nodes to be scored   (7327 eliminated genes)
## 
##   Level 2:   5 nodes to be scored    (9225 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13454 eliminated genes)

Molecular Function Enrichment for DOWN-REGULATED genes: 23 genes

ResMFDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=MFannAR, ontology='MF', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='MFDownAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4142 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4603 GO terms and 5944 relations. )
## 
## Annotating nodes ...............
##  ( 13956 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 103 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 10:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 9:   3 nodes to be scored    (0 eliminated genes)
## 
##   Level 8:   9 nodes to be scored    (941 eliminated genes)
## 
##   Level 7:   14 nodes to be scored   (1149 eliminated genes)
## 
##   Level 6:   15 nodes to be scored   (1309 eliminated genes)
## 
##   Level 5:   18 nodes to be scored   (1608 eliminated genes)
## 
##   Level 4:   24 nodes to be scored   (3294 eliminated genes)
## 
##   Level 3:   12 nodes to be scored   (7327 eliminated genes)
## 
##   Level 2:   5 nodes to be scored    (9171 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13444 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFDown'), recursive=TRUE)
#GOAnnotation(ResMFDown$ResSel, GOdata=ResMFDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFDown'), keytype='SYMBOL')
GOTable(ResMFDownAR$ResSel, maxGO=20)

Molecular Function Analysis for UP-REGULATED genes: 2 genes

ResMFUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=MFannAR, ontology='MF', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='MFUpAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4142 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4603 GO terms and 5944 relations. )
## 
## Annotating nodes ...............
##  ( 13956 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 5 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 4:   1 nodes to be scored    (0 eliminated genes)
## 
##   Level 3:   2 nodes to be scored    (0 eliminated genes)
## 
##   Level 2:   1 nodes to be scored    (224 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (11430 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFUp'), recursive=TRUE)
#GOAnnotation(ResMFUp$ResSel, GOdata=ResMFUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFUp'), keytype='SYMBOL')
GOTable(ResMFUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResMFAllAR$ResSel, TopGOResDown=ResMFDownAR$ResSel, 
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResMFAllAR$ResSel, TopGOResDown = ResMFDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.4 TopGO analysis: Cellular Component

On the basis of the analysis settings, the enrichment for Cellular Component IS performed.

Cellular Component Enrichment for ALL modulated genes: 25 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
CCannAR <- topGO::annFUN.org(whichOnto='CC', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResCCAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=CCannAR, ontology='CC', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='CCAllAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1699 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1921 GO terms and 3231 relations. )
## 
## Annotating nodes ...............
##  ( 14191 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 91 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 11:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  8 nodes to be scored    (0 eliminated genes)
## 
##   Level 9:   9 nodes to be scored    (72 eliminated genes)
## 
##   Level 8:   13 nodes to be scored   (499 eliminated genes)
## 
##   Level 7:   11 nodes to be scored   (1457 eliminated genes)
## 
##   Level 6:   14 nodes to be scored   (5828 eliminated genes)
## 
##   Level 5:   10 nodes to be scored   (7323 eliminated genes)
## 
##   Level 4:   10 nodes to be scored   (10438 eliminated genes)
## 
##   Level 3:   12 nodes to be scored   (13355 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (13807 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14092 eliminated genes)

#write.table(ResCCAll$ResAll, file=paste0(SavingFolder, 'TopGO/CCAllResults.txt'), sep='\t', row.names=FALSE)

Cellular Component Enrichment for DOWN-REGULATED genes: 23 genes

# Wrapper function for topGO analysis (see helper file)
ResCCDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=CCannAR, ontology='CC', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='CCDownAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1699 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1921 GO terms and 3231 relations. )
## 
## Annotating nodes ...............
##  ( 14191 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 91 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 11:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  8 nodes to be scored    (0 eliminated genes)
## 
##   Level 9:   9 nodes to be scored    (72 eliminated genes)
## 
##   Level 8:   13 nodes to be scored   (499 eliminated genes)
## 
##   Level 7:   11 nodes to be scored   (1457 eliminated genes)
## 
##   Level 6:   14 nodes to be scored   (5828 eliminated genes)
## 
##   Level 5:   10 nodes to be scored   (7323 eliminated genes)
## 
##   Level 4:   10 nodes to be scored   (10438 eliminated genes)
## 
##   Level 3:   12 nodes to be scored   (13355 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (13807 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14092 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCDown'), recursive=TRUE)
#GOAnnotation(ResCCDown$ResSel, GOdata=ResCCDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCDown'), keytype='SYMBOL')
GOTable(ResCCDownAR$ResSel, maxGO=20)

Cellular Component Analysis for UP-REGULATED genes: 2 genes

# Wrapper function for topGO analysis (see helper file)
ResCCUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=CCannAR, ontology='CC', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='CCUpAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1699 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1921 GO terms and 3231 relations. )
## 
## Annotating nodes ...............
##  ( 14191 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 12 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 7:   1 nodes to be scored    (0 eliminated genes)
## 
##   Level 6:   2 nodes to be scored    (0 eliminated genes)
## 
##   Level 5:   2 nodes to be scored    (897 eliminated genes)
## 
##   Level 4:   3 nodes to be scored    (6555 eliminated genes)
## 
##   Level 3:   2 nodes to be scored    (10965 eliminated genes)
## 
##   Level 2:   1 nodes to be scored    (11443 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (12601 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCUp'), recursive=TRUE)
#GOAnnotation(ResCCUp$ResSel, GOdata=ResCCUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCUp'), keytype='SYMBOL')
GOTable(ResCCUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResCCAllAR$ResSel, TopGOResDown=ResCCDownAR$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResCCAllAR$ResSel, TopGOResDown = ResCCDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`


7. Savings

Most of the useful information has been saved during the analysis. Here I save figures, workspace and information about the session.

if (params$SaveImages == TRUE){   #Just in case since eval only works when knitting
  #Set the folder paths
  from <- paste(getwd(), paste(params$NbName, 'files/figure-html', sep='_'), sep='/')
  to <- params$FiguresFolder

  #Copy to output directory
  file.copy(from, to, recursive = TRUE, copy.mode = TRUE)
}
SessionInfo <- sessionInfo()
Date <- date()

save.image(paste0(SavingFolder, '/ipsc.', 'FunctionalAnalysisWorkspace_AR.RData'))

last update on: Fri Jan 10 20:29:32 2025